Colander: Sifting Documents for Special Terms
نویسندگان
چکیده
We propose to demonstrate Colander, a set of lightweight approaches, and a tool, to mine “special terms” from a corpus of documents. In this proposal, we give an overview of three highlevel approaches to extracting special terms; introduce some metrics to measure the performance of the approaches; and outline an evaluation methodology. We also provide a high-level description of the tool and its features. In this proposal, we illustrate the approaches and the tool in an application to mine both known and candidate trademarks used in documents.
منابع مشابه
Photon-Number-Splitting-attack resistant Quantum Key Distribution Protocols without sifting
HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau...
متن کاملHigher-order frameworks for profiling and matching heterogeneous data
This Thesis brings together complementary research from higher-order computational logic and workflow systems to investigate software and theoretical frameworks for profiling and matching heterogeneous data. A motivating use case is submission sifting, which matches submitted conference or journal papers to potential peer reviewers based on the similarity between the paper’s abstract and the re...
متن کاملA Model Sifting Problem of Selberg
We study a model sifting problem introduced by Selberg, in which all of the primes have roughly the same size. We show that the Selberg lower bound sieve is asymptotically optimal in this setting, and we use this to give a new lower bound on the sifting limit βκ in terms of the sifting dimension κ. We also show that one can use a rounding procedure to improve on the Selberg lower bound sieve by...
متن کاملComparison of Texts Streams in the Presence of Mild Adversaries
Text sifting is a method of quickly and securely identifying documents for database searching, copy detection, duplicate email detection and plagiarism detection. A small amount of text is extracted from a document using hash functions and is used as the document’s fingerprint. We build upon previous work by Broder et al. [4,5] and Heintze [8], specifically addressing a certain set of attacks t...
متن کاملPost Walrasian Macroeconomics and IS / LM Analysis
In recent work I have tried to spell out a Post Walrasian approach to macroeconomics (Colander, 1995a) and to translate that Post Walrasian vision into the aggregate supply/aggregate demand framework (Colander, 1995b). In this paper I continue that work and begin to relate the Post Walrasian vision to the standard IS/LM analysis. The paper is not about high theory; instead it is about the pedag...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009